An Adaptive Voice Activity Detection Algorithm
نویسندگان
چکیده
Voice Activity Detection (VAD) is a crucial step for speech processing, which detecting accuracy and speed directly affects the effect of subsequent processing. Some voice processing system based phone or in the indoor environment, which need simple and quick method of VAD, for these representative voice signal, this paper proposes a new algorithm which is adaptive and quick based on a major improvement to Dual-Threshold endpoint detection algorithm. First the amplitude normalization is processed to the original voice signal, the characteristic is extracted by means of short-time amplitude, which can simplify operation. Then, large-scale (long frame-length and frame-shift) short-time amplitude is used for rough detection, combining adaptive threshold judgement of consecutive frames, which can find voice areas of start-point and end-point quickly. To these areas, small-scale (short frame-length and frame-shift) short-time amplitude is used for accurate detection, forward scanning is put to start-point area, reverse scanning is put to end-point area, combining adaptive threshold judgement of consecutive frames, start-point and end-point of the effective speech can be accurately located. Experimental results show that the method of this paper can detect endpoints of voice signal more quickly and accurately, which can improve recognition performance dramatically. Largescale can increase detection speed, small-scale can improve detection accuracy, both can be adjusted to satisfy the different requirements. The method of this paper ensures both detection speed and precision, which has more flexibility and applicability.
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملVoice Activity Detection Algorithm based on Improved Radial Basis Function Neural Network
Voice activity detection (VAD) is the key of voice recognition, voice synthesis and speech-sound enhancement.For the sake of improve the accuracy and robustness of speech endpoint detection system. Combining the advantages of adaptive genetic algorithm (AGA) and improved radial basis function network (RBF) defects in existing learning methods. This paper presents a comprehensive detection metho...
متن کاملA Wavelet-Based Voice Activity Detection Algorithm in Variable-Level Noise Environment
In this paper, a novel entropy-based voice activity detection (VAD) algorithm is presented in variable-level noise environment. Since the frequency energy of different types of noise focuses on different frequency subband, the effect of corrupted noise on each frequency subband is different. It is found that the seriously obscured frequency subbands have little word signal information left, and...
متن کاملThe Self-adaptive Voice Activity Detection Algorithm based on time- frequency Parameters
In order to solve the inferior performance and sad self-adaptive of the traditional voice activity detection algorithm in an environment with low Signal to Noise Ratio (SNR), a new self-adaptive voice activity detection algorithm based on time-frequency (TF) parameters is put forward. After introducing the time-domain log-energy and improved mel-scale log-energy, the new TF parameters are acqui...
متن کاملRobust voice activity detection using perceptual wavelet-packet transform and Teager energy operator
In this letter, a robust voice activity detection (VAD) algorithm is presented. This proposed VAD algorithm makes use of the perceptual wavelet-packet transform and the Teager energy operator to compute a robust parameter called voice activity shape for VAD. The main advantage of this algorithm is that the preset threshold values or a priori knowledge of the SNR usually needed in conventional V...
متن کامل